The plan for the live coding is for us to work together to solve problems using tools from this week’s lessons. The skills from this week are about making slide presentations, checking data and calculations, and producing dyanmic graphics.
Today I will work through examples
This will be done in a separate document. Watch the recording of the session (or the live session) for more information.
When you save your R presentation, you should see a preview appear. The preview window has a pop-up menu with a “save as webpage” option. Use this to create a file that can be viewed as a slideshow from a web browser.
Let’s use the penguins and penguins_raw data.
First get an overview of the data. Use functions from the dlookr package: describe, diagnose, diagnose_category. This is a carefully cleaned data set, so there are probably no obvious problems with it.
describe(penguins_raw)
## # A tibble: 7 x 26
## variable n na mean sd se_mean IQR skewness kurtosis p00
## <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Sample … 344 0 63.2 40.4 2.18 6.62e+1 0.351 -0.926 1
## 2 Culmen … 342 2 43.9 5.46 0.295 9.27e+0 0.0531 -0.876 32.1
## 3 Culmen … 342 2 17.2 1.97 0.107 3.10e+0 -0.143 -0.907 13.1
## 4 Flipper… 342 2 201. 14.1 0.760 2.30e+1 0.346 -0.984 172
## 5 Body Ma… 342 2 4202. 802. 43.4 1.20e+3 0.470 -0.719 2700
## 6 Delta 1… 330 14 8.73 0.552 0.0304 8.72e-1 0.239 -0.748 7.63
## 7 Delta 1… 331 13 -25.7 0.794 0.0436 1.26e+0 0.338 -1.03 -27.0
## # … with 16 more variables: p01 <dbl>, p05 <dbl>, p10 <dbl>, p20 <dbl>,
## # p25 <dbl>, p30 <dbl>, p40 <dbl>, p50 <dbl>, p60 <dbl>, p70 <dbl>,
## # p75 <dbl>, p80 <dbl>, p90 <dbl>, p95 <dbl>, p99 <dbl>, p100 <dbl>
diagnose(penguins_raw)
## # A tibble: 17 x 6
## variables types missing_count missing_percent unique_count unique_rate
## <chr> <chr> <int> <dbl> <int> <dbl>
## 1 studyName charac… 0 0 3 0.00872
## 2 Sample Number numeric 0 0 152 0.442
## 3 Species charac… 0 0 3 0.00872
## 4 Region charac… 0 0 1 0.00291
## 5 Island charac… 0 0 3 0.00872
## 6 Stage charac… 0 0 1 0.00291
## 7 Individual ID charac… 0 0 190 0.552
## 8 Clutch Comple… charac… 0 0 2 0.00581
## 9 Date Egg Date 0 0 50 0.145
## 10 Culmen Length… numeric 2 0.581 165 0.480
## 11 Culmen Depth … numeric 2 0.581 81 0.235
## 12 Flipper Lengt… numeric 2 0.581 56 0.163
## 13 Body Mass (g) numeric 2 0.581 95 0.276
## 14 Sex charac… 11 3.20 3 0.00872
## 15 Delta 15 N (o… numeric 14 4.07 331 0.962
## 16 Delta 13 C (o… numeric 13 3.78 332 0.965
## 17 Comments charac… 290 84.3 11 0.0320
diagnose_category(penguins)
## # A tibble: 9 x 6
## variables levels N freq ratio rank
## <chr> <fct> <int> <int> <dbl> <int>
## 1 species Adelie 344 152 44.2 1
## 2 species Gentoo 344 124 36.0 2
## 3 species Chinstrap 344 68 19.8 3
## 4 island Biscoe 344 168 48.8 1
## 5 island Dream 344 124 36.0 2
## 6 island Torgersen 344 52 15.1 3
## 7 sex male 344 168 48.8 1
## 8 sex female 344 165 48.0 2
## 9 sex <NA> 344 11 3.20 3
diagnose_category(diamonds)
## # A tibble: 20 x 6
## variables levels N freq ratio rank
## <chr> <ord> <int> <int> <dbl> <int>
## 1 cut Ideal 53940 21551 40.0 1
## 2 cut Premium 53940 13791 25.6 2
## 3 cut Very Good 53940 12082 22.4 3
## 4 cut Good 53940 4906 9.10 4
## 5 cut Fair 53940 1610 2.98 5
## 6 color G 53940 11292 20.9 1
## 7 color E 53940 9797 18.2 2
## 8 color F 53940 9542 17.7 3
## 9 color H 53940 8304 15.4 4
## 10 color D 53940 6775 12.6 5
## 11 color I 53940 5422 10.1 6
## 12 color J 53940 2808 5.21 7
## 13 clarity SI1 53940 13065 24.2 1
## 14 clarity VS2 53940 12258 22.7 2
## 15 clarity SI2 53940 9194 17.0 3
## 16 clarity VS1 53940 8171 15.1 4
## 17 clarity VVS2 53940 5066 9.39 5
## 18 clarity VVS1 53940 3655 6.78 6
## 19 clarity IF 53940 1790 3.32 7
## 20 clarity I1 53940 741 1.37 8
Computing summary statistics (mean, median, etc) with variables that contain missing data.
penguins_raw %>% group_by(Species) %>%
summarize(mean_flipper_length = mean(`Flipper Length (mm)`, na.rm=TRUE),
n_missing = skimr::n_missing(`Flipper Length (mm)`),
n_complete = skimr::n_complete(`Flipper Length (mm)`),
n =n(),
my_n_missing = sum(is.na(`Flipper Length (mm)`)),
m_n_complete = sum(!is.na(`Flipper Length (mm)`)))
## # A tibble: 3 x 7
## Species mean_flipper_le… n_missing n_complete n my_n_missing m_n_complete
## * <chr> <dbl> <int> <int> <int> <int> <int>
## 1 Adelie … 190. 1 151 152 1 151
## 2 Chinstr… 196. 0 68 68 0 68
## 3 Gentoo … 217. 1 123 124 1 123
penguins_raw %>% filter(!is.na(Sex)) %>%
group_by(Species, Sex) %>%
summarize(mean_flipper_length = mean(`Flipper Length (mm)`, na.rm=TRUE),
n_missing = skimr::n_missing(`Flipper Length (mm)`),
n_complete = skimr::n_complete(`Flipper Length (mm)`),
n =n(),
my_n_missing = sum(is.na(`Flipper Length (mm)`)),
m_n_complete = sum(!is.na(`Flipper Length (mm)`)))
## `summarise()` has grouped output by 'Species'. You can override using the `.groups` argument.
## # A tibble: 6 x 8
## # Groups: Species [3]
## Species Sex mean_flipper_le… n_missing n_complete n my_n_missing
## <chr> <chr> <dbl> <int> <int> <int> <int>
## 1 Adelie Penguin… FEMA… 188. 0 73 73 0
## 2 Adelie Penguin… MALE 192. 0 73 73 0
## 3 Chinstrap peng… FEMA… 192. 0 34 34 0
## 4 Chinstrap peng… MALE 200. 0 34 34 0
## 5 Gentoo penguin… FEMA… 213. 0 58 58 0
## 6 Gentoo penguin… MALE 222. 0 61 61 0
## # … with 1 more variable: m_n_complete <int>
Use plotly to make a scatterplot. This creates an interactive HTML “widget” that lets the user view data about the plot, zoom and pan the plot.
plot_ly(penguins, x = ~ body_mass_g, y = ~ flipper_length_mm, color = ~ species)
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
## Warning: Ignoring 2 observations
The gganimate package allows you to convert a regular ggplot into a series of frames which are the animated. There are several functions that can be used to move (transition) from one frame to the next:
transition_states which uses a variable to define a partitioning of the data; a bit like a temporal version of “colour”transition_time splits the data by a quantitative variable and uses the value of the variable to time the movement through the datatransition_events which requires start and end times for each frameWe’ll use the penguin or gapminder data to generate some animations. The examples in the transition_* help pages have some great examples.
gapminder %>% ggplot(aes(year, lifeExp)) + geom_line(aes(group=country)) +
transition_manual(continent)
## nframes and fps adjusted to match transition
Many lines and then a line with changing slope:
df <- tibble(intercept = 0:10,
slope = (-5:5)/5,
group = 0:10)
df %>% ggplot() +
geom_abline(aes(intercept = intercept, slope = slope)) +
xlim(-10, 10) +
ylim(-2, 12)
Now animate
df %>% ggplot() +
geom_abline(aes(intercept = intercept, slope = slope)) +
xlim(-10, 10) +
ylim(-2, 12) +
transition_time(group)
df %>% ggplot() +
geom_abline(aes(intercept = intercept, slope = slope)) +
xlim(-10, 10) +
ylim(-2, 12) +
transition_states(group)
df %>% ggplot() +
geom_abline(aes(intercept = intercept, slope = slope)) +
xlim(-10, 10) +
ylim(-2, 12) +
transition_manual(group)
## nframes and fps adjusted to match transition
Try this for penguins
penguins %>% ggplot(aes(body_mass_g, flipper_length_mm, color = sex)) + geom_point() +
transition_states(species) # transition_manual(species)
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).